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Effects of sequence disorder on looping and cyclization of the double-stranded DNA are studied 
theoretically. Both random intrinsic curvature and inhomogeneous bending rigidity are found to 
result in a remarkably wide distribution of cyclization probabilities. For short DNA segments, the 
range of the distribution reaches several orders of magnitude for even completely random sequences. 
The ensemble averaged values of the cyclization probability are also calculated, and the connection 
to the recent experiments is discussed. 
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I. INTRODUCTION 



Bending of double-stranded (ds) DNA molecules into 
loops is important for such biological processes as regu- 
lation of gene expression and DNA packaging into nucle- 
osomes [l[ Understanding the underlying physics of 
DNA looping is necessary for any quantitative descrip- 
tion of these biological processes. While the classical 
theory based on the elastic description of the DNA was 
proposed more than two decades ago Q , there are multi- 
ple indications that it poorly describes the experimental 
situation. In fact, the discrepancy between this theory 
and the experiment 0, S 0, S 111 often reaches several 
orders of magnitude. Partially, this difference can be 
attributed to multiple complications arising from the in 
vivo conditions. However, recent experiments performed 
in vitro still show substantial inadequacy of the existing 
theoretical description. Several recent models attributed 
this discrepancy to the possibility of local disruptions 
of the ds-DNA structure such as "kinks" and "bub- 
bles" nil- However, both the classical description 
and the recent extensions ignore the sequence disorder, 
an intrinsic property of a typical DNA. 

In this paper, we discuss the effect of sequence disor- 
der on loop formation in DNA. The inhomogeneity of the 
DNA structure is inherent to its biological function. In 
nature, it is likely that sequences of the DNA segments 
involved in looping have evolved to optimize their bend- 
ing properties. Here we limit ourselves to the study of 
completely random DNA sequences similar to those in 
recent experiments [1, H 9- Even in this simplest case, 
sequence-dependent effects lead to a remarkably wide dis- 
tribution of the expected values of the looping probabil- 
ity and thus must be taken into account in any realistic 
description. 



II. DNA LOOPING PROBLEM 

The conformational properties of ds-DNA are com- 
monly described by the worm-like chain (WLC) model, 
which proved to be particularly successful in accounting 
for the results of the single- molecule DNA stretching [l^] . 
Within this model, the DNA is viewed as an inextensible 
elastic rode. Let L be the length of the chain, s be the 
coordinate along the chain {0 < s < L), r(s) be the con- 
formation of the chain, t(s) = dr/ds be the unit tangent 
vector at s, and k(s) = d^r/ds^ be the chain curvature 
at s. Then the effective Hamiltonian of the WLC model 
can be written as a sum of the bending and torsional 
energies: 



WLC 



[k] 



kT 



— [ [k(s)]^ + torsion. 
2 Jo 



(1) 



Here and below, Ip is the persistence length, a parameter 
proportional to the effective bending modulus and play- 
ing the role of the orientational correlation length of a free 
chain. A similar parameter for the torsional contribution 
is known as the torsional persistence length It. The ex- 
perimental values of these parameters are Ip « 50 nm 
and It ~ 70-110 nm under physiological conditions. 

The classical theory of DNA looping based on the WLC 
model was developed by Shimada and Yamakawa 
Looping probability is proportional to the so-called J- 
factor, which is defined as the effective concentration of 
one of the chain ends in the vicinity of the other. For- 
mally, the J-factor is given as the canonical ensemble 
probability of the cyclized state: 



J 



exp(-ip)i3[k] 
/mexp(-^ 



(2) 
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where the path integrations are over all possible confor- 
mations of the chain. Function i?[k] in the numerator 
represents a set boundary constraints in a given physi- 
cal situation. In particular, one can distinguish between 
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protein-mediated looping (common for gene regulation) 
and cyclization (achieved by hybridization of the comple- 
mentary single-stranded ends of a DNA segment). The 
general case of DNA looping may involve rather compli- 
cated set of boundary conditions. In this paper we focus 
on the problem with boundary constraints 



S[k] = 5[r(L)-r(0)]<5[t(L)-t(0)] 



(3) 



i.e. we assume alignment of both positions [r(L) = i"(0)] 
and orientations [t{L) — t(0)] of the chain ends. [Here 
r and t are understood as functionals of k(s).] These 
boundary constraints are commonly employed to model 
DNA cyclization. In this case, the Shimada-Yamakawa 
result for L < Alp is given by [3;] 



exp 



(4) 

where Ci and C2 are known numerical coefficients. Fac- 
tor Jt is the contribution of the torsional constraints giv- 
ing rise to the rapid oscillations of Jsy{L) with the pe- 
riod of the helix repeat h = 10.5 bp. For short DNA 
molecules, the overall result is dominated by the bending 
energy E/kT = 2'K^lp/L of the ground state (a circle). 
The prefactor and the subdominant C2-term are of en- 
tropic origin. Modifications of the torsional contribution 
due to the sequence disorder are a smaller effect and will 
not be discussed in this paper. Thus, Jt will be assumed 
to have its classical form and omitted from all the ex- 
pressions below for simplicity. 

Cyclization properties of DNA molecules were studied 
experimentally by several groups [1, H, H, 0) Hi ■ The J- 
factor is typically measured as the ratio of the equilibrium 
constants for chain cyclization and bimolecular associ- 
ation. Substantial quantitative disagreement with the 
Shimada-Yamakawa theory was reported for short chain 
lengths, with experimental results showing significantly 
higher values of the J-factor than allowed by the theory 
(up to several orders of magnitude). In early works [3, Q 
this disagreement was attributed to the difference be- 
tween the in vivo and in vitro conditions. However, 
the latest experiments |6i, lS, i8| conducted in vitro still 
indicate considerable deficiency of the classical theory. 
Cloutier and Widom 0, 0] reported cyclization proba- 
bilities four orders of magnitude higher than predicted 
theoretically. At the same time, the results of Du et 
al. were different from those of Cloutier and Widom 
and closer to the classical prediction. Nevertheless, their 
data could only be fitted by using the value of the per- 
sistence length as a fitting parameter. Thus, the cycliza- 
tion probability appears substantially underestimated by 
the classical theory for short DNA chains, and different 
groups report seemingly different experimental results. 

Two effects of the sequence disorder are studied in this 
paper. First, the local structure of the ds-DNA can be 
characterized by the sequence-dependent intrinsic curva- 
ture ko(s). Second, structural inhomogeneity naturally 
results in variations of the bending modulus along the 



chain. This can be accounted for by the s-dependent per- 
sistence length lp{s). The corresponding generalization 
of the WLC model is described by the following effective 
Hamiltonian: 



H[k] 
kT 



lp{s) [k(s) — ko(s)]^ ds + torsion. (5) 



Looping and cyclization occur on the mesoscopic scales, 
with characteristic lengths larger than a single base pair 
and comparable to the persistence length. The above 
Hamiltonian is particularly suitable in this regime, where 
the elastic description is still valid, while the inhomogene- 
ity of the elastic parameters is significant. When ko(s) is 
identically zero and lp{s) is a constant, this Hamiltonian 
reduces to the WLC Hamiltonian llj. We discuss the 
two sequence-disorder effects separately in the following 
two sections, and then combine them in the Discussion 
section. 



III. EFFECT OF RANDOM INTRINSIC 
CURVATURE 



First, we consider the effect of the sequence-dependent 
intrinsic curvature and use the effective Hamiltonian of 
Eq. ([5]) with constant Ip. Each DNA molecule is not a 
zero-thickness one-dimensional line; it has a finite cross- 
section instead. In general, an arbitrary cross-section can 
be characterized by two principal axes with two different 
values of the intrinsic curvature along each of them. An 
important point is that these principal directions are de- 
termined by the structure of the molecule and do not 
depend on the molecular conformation. In particular, 
the two principal axes have generally no relation to the 
normal and the binormal of a particular conformation. 
Thus, a DNA chain can be characterized by two pairs 
of the orthonormal vectors in the plain normal to the 
tangent: pair ei and 62 is firmly affixed to the internal 
structure, and pair n and b is a function of its conforma- 
tion. 

Vector of the intrinsic curvature ko can be decomposed 
along the principal axes at every point along the chain: 
kg = kiei + k2e2. Since looping and cyclization occur on 
scales much longer than the base pair length, the distri- 
bution of the intrinsic curvature w[ki,k2] for a random 
sequence is Gaussian with zero average due to the Central 
Limit Theorem: 



;[/ci,/c2] = JJexp 



// h{s)S{s- s')ki{s')dsds'' 



2k 



(6) 

i.e. the intrinsic curvature is delta-correlated along the 
chain: 

(fcj(s)fcj(s'))ko = uSijdis - s'). (7) 
Here and below, angular brackets with subscript kg refer 
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to the averaging with distribution ([6]) : 



(/[fcl,fe])ko = 



// DkiDk2w[ki,k2]f[ki,k2 

// DkiDk2w[ki,k2] 



(8) 



Note that the asymmetry of the cross-section averages 
out on the scale of a few base pairs due to the hehcal 
structure of ds-DNA, and thus we assume the value of 
parameter k to be the same for both principal compo- 
nents. From the consensus-scale data of Gabrielian et al. 
for various base-pair combinations (isj we found the value 
of parameter k « 0.13/^p for several long (> 10000 bp) 
random sequences by using their "Bend It" DNA tools 
server. Note that a recent numerical study by Rappaport 
and Rabin also considered effects of the spontaneous 
sequence-dependent curvature; however, most of their re- 
sults arose from the non-zero average intrinsic curvature 
assumed in that work. In our model all the effects are 
solely due to the random contribution and completely 
average out for a sufficiently long chain. 

Remarkably, one can obtain a simple and general re- 
sult for the J-factor averaged over the ensemble with the 
above statistics. The unconstrained partition function 
in the denominator of Eq. ^ is essentially a Gaussian 
integral, which is independent of the mean value of the 
Gaussian. Thus, the denominator is independent of the 
intrinsic curvature, and hence the averaging over the dis- 
order affects only the numerator. Averaging over disor- 
der kg is interchangeable with path integration over k, 
and only the Boltzmann factor depends on kg. By aver- 
aging the Boltzmann factor in the numerator of Eq. ^ 
according to Eq. the original Boltzmann factor with 
a renormalized persistence length can be recovered: 



(exp 



exp 



2 1 + Klp 



kT ' 



(1 + «:?p)i/« 



(9) 



independently of the angle between n and ei at every 
point along the chain. Here, ^ is the curvature correla- 
tion length of the order of a few base pairs. Since the de- 
nominator of Eq. is a Gaussian integral, it scales with 

the persistence length as Ip (times an /p-independent 
factor). Thus, this denominator can be combined with 
(1 -I- Klp)^/^ of expression ([9]) to recover the same denom- 
inator, but as a function of the renormalized persistence 
length: [(1 -|- Klp)/lp]^^^ (times the same /p-independent 
factor). Thus, the following exact result for the cycliza- 
tion probability can be obtained: 



{J{L,lp))kg = Jsy{LJ*), 



where /* 



1 + kL 



(10) 



is the renormalized persistence length. As a result of 
our averaging over the disorder, we recovered the original 
Shimada-Yamakawa expression, but with I* instead of Ip. 
The renormalized persistence length /* is lower than the 
original one Ip, and thus the DNA molecule is softer and 
easier to bend. We emphasize that this result is valid 



for any boundary constraints i?[k]; it is not necessarily 
limited to looping or cyclization problems and hence very 
general. Note that a necessary condition of this theorem 
is that the open configuration [described by the denomi- 
nator of Eq. ([2])] must be completely unconstrained, e.g. 
it cannot be under an external force. However, this the- 
orem can be applied to DNA stretching experiments. In 
this case the stretched chain corresponds to the numer- 
ator of Eq. ([2]) with boundary constraints B[k] repre- 
senting the condition of the fixed end-to-end distance. 
Thus, the same renormalization does take place in the 
stretc hing exp eriments, as was previously discussed in 
Refs. 151 Il6l [TtI. Therefore, value of the persistence 
length extracted from such experiments (~ 50 nm) is I* 
rather than /p, and thus ( J)ko actually coincides with the 
Shimada-Yamakawa result, plotted for the experimental 
values of this parameter. 

However, ( J)ko is not the only characteristics of the cy- 
clization probability for random sequences, and the width 
of the distribution of J turns out to be equally impor- 
tant. The latter can be deduced from the comparison 
of log (J) ko with (logJ)ko. Moreover, if the distribution 
of log J is Gaussian, then the average coincides with the 
maximum of probability and hence (log J)ko is the value 
most likely to be observed when plotting the experimen- 
tal data on the logarithmic scale. Finally, (log J)k(, has 
a clear physical meaning of (the negative of) the aver- 
age free energy difference between the looped and the 
unconstrained states. 

Quantity (log J)kQ can be calculated in a fashion simi- 
lar to (J)ko- For short chains, the average free energy dif- 
ference between the looped and the unconstrained states 
is dominated by the average difference in the energy of 
the ground state. The energy of the ground state is given 
by the Hamiltonian minimized with respect to the angle 
between n and ei. Thus, conducting the averaging over 
the disorder of the minimized Hamiltonian, one can ob- 
tain the following result for (log J)ko (for short enough 
chains): 



(fog •^>ko 



2t:^Ip 



2ttk ^ / L 

—^""r^-p 



(11) 



The first (dominant) term is the bending energy of the 
circle as in the Shimada-Yamakawa result. The second 
term originates from the cross-term in the Hamiltonian 
and reflects the fact that a finite-length segment has a 
non-zero mean intrinsic curvature of the order of k/L 
despite the ensemble-average curvature being zero. This 
correction raises the expectation value of log J (although 
the result is still lower than log(J)ko foi' short chains, as 
expected). For longer chains (log J)ko = fog('^)ko due to 
self-averaging. This result for long chains and result (fTT|) 
for short chains can be interpolated to the entire range 
of the chain lengths. The result of this interpolation 
(which is asymptotically correct in both limits) is plotted 
in Fig.[T]together with the sequence-averaged (J(L))ko of 
Eq. (fTO|) (which coincides with the Shimada-Yamakawa 
result). 
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FIG. 1: Effect of the intrinsic curvature on the cyclization 
probability. The contribution of the torsional constraints to 
the J-factor is omitted. The dashed line represents the aver- 
age value of the J-factor over the intrinsic curvature [Eq. (|10|) ] , 
which coincides with the Shimada-Yamakawa result for the 
experimental values of the persistence length. The solid line 
is the average value of log J, i.e. the interpolation between 
Eq. ITT} for short chains and log(J)ko for long chains. The 
width of the distribution is indicated by the grey shading; it 
reaches an order of magnitude for L = lp/2. Two thirds of 
random sequences are supposed to lie within the shaded area. 



For weak enough disorder the free energy and therefore 
log J of the cyclized state have a Gaussian distribution. 
In this case, its width 6 can be inferred from the two 
averages S"^ — 2 (log(J)ko ~ (log^)ko) and is also shown 
in Fig. [TJ This width is substantial, and the distribution 
spans an order of magnitude for short chains. 



IV. EFFECT OF INHOMOGENEOUS BENDING 
RIGIDITY 



The second effect of the sequence disorder deals with 
the inhomogcncous bending rigidity and can be described 
by the effective Hamiltonian of Eq. ([5]) with zero intrinsic 
curvature ko but s-dependent persistence length Ip. We 
limit our discussion to the case of the isotropic bending 
[described by a single local bending modulus kTlp{s)], 
although a more general exposition should allow for the 
anisotropy of the local bending modulus as well. 

The major effect is related to the correction to the en- 
ergy of the ground state. Consider a chain segment of 
length L bent by a small angle A9. Since the intrinsic 
curvature is zero at all points, the chain segment is con- 
fined to a plain and can be described by angle 0{s) in that 
plain. Torque lp{s)d9/ds is constant along the chain and 

equal to A9/ /J" l~^{s)ds. Hence the bending energy can 



be written as 

Ipjs) fd9 
ds 



E 
kf 



or 







E 
kf 



ds = 



lp{s) {A9fds 



lp{s) lo lp\s')ds' 



2xL ' 



where 



Ip 



lp{s) 



(12) 



is a dimensionless parameter. Here and below we distin- 
guish between the ensemble- wide harmonic average per- 
sistence length Ip = {lp^)~^ and the local s-dependent 
value lp{s). Equation p2| implies that the chain seg- 
ment can be described by the effective bending modulus 
kTlp/x instead of kTlp. 

In full analogy with the case of the intrinsic curvature, 
it is the harmonic average that is relevant to both DNA 
cyclization/looping and DNA stretching problems, as was 
first pointed out by Wiggins This observation is 

valid when the correlation length of the disorder is much 
smaller than the persistence length, which is indeed a 
typical situation for DNA. Mathematically, the origin of 
the harmonic average is the same as in the problem of 
several Hookean springs connected in series, where the 
effective spring constant for the series is the harmonic 
average of the respective spring constants. 

The concept of the effective bending modulus can 
now be applied to the cyclization problem by impos- 
ing the constraint t{L) — t(0), i.e. assuming A6' — 2ti 
in Eq. (|T2p . Strictly speaking, the other constraint 
v{L) — r(0) is not automatically satisfied as in the uni- 
form case. However, the corresponding correction to the 
overall bending energy is weaker than the effect of the 
renormalization of the bending modulus and hence ne- 
glected (see Appendix for details). We also ignore any 
corrections to the subdominant (for L < Ip) entropic 
terms in the Shimada-Yamakawa result. Thus, the only 
modification of the expression for J is in the ground state 
bending energy: 



J{L) = JsY{L)eKY> 



2t:H 



'-I 
L \x 



p 



(13) 



Now, all statistical properties of the J-factor are com- 
pletely determined by those of quantity x. Random quan- 
tity lp^{s) in the definition of x is correlated on the length 
scale Al of the order of a few base pairs, which is much 
shorter than L. Thus, according to the Central Limit 
Theorem, the distribution of x is Gaussian. Parame- 
ters of this distribution are fully defined by the average 
and the second-order correlator of lp^{s). By definition, 
(Zpi(s)) = \ therefore (x) = 1. Since Al < L, the 
correlator of lp^{s) can be written as 



Lp 



Here a is a dimensionless parameter related to the cor- 
relation length Al and the second moment of the local 
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distribution of / ^(s): 



(15) 



This parameter can be estimated from a set of the 
consensus-scale data of Muntcanu et al. 19]. Their data 
hsts sequence-dependent Young's modulus for all 64 pos- 
sible trinucleotides. We find that {[ip^ {s) — lp^\ ) ~ 
0.13Z~^. Assuming A/ « 3 bp and Ip k, 150 bp, we find 
a ~ 0.13AZ//p K, 0.0025. Note that a trinucleotid may 
be too approximate a candidate for the correlation length 
A^, and hence the above value of coefficient a is just an 
order-of-magnitude estimate. From Eq. ^4]) we can ob- 
tain the dispersion of x: 

a^^{{x-{x)f) 



(16) 

Thus, the overall (normalized) distribution of x is: 



0(x) 



2naL 



■ exp 



^ {--If 



2al 



(17) 



Note that we did not assume the statistics of lp^{s) to 
be Gaussian in this derivation. 

Based on the statistics of x, one can obtain the distri- 
bution of J of Eq. US]): 




0.5 1.0 1.5 

y-factor j-J/J^.^ 



2.5 



FIG. 2: Probability distribution (O of the J-factor ^ for 
various values of the chain length: L{1~^) = 2.0, 1.4, 1.0, 
0.7, and 0.5. As the chain length decreases, the distribution 
evolves from nearly Gaussian to significantly non-Gaussian, 
with a strong tail developing at large values of J. The contri- 
bution of these large but rare values raises the average value 
of the J-factor substantially. 



where xq is the saddle point: 



Xo 



7 + 1 + 



(21) 



P{j) 



■ exp 



I 

2a 



27r2 — I log j 



j{2n^~l\ogj) 

(18) 

where j = J / Jsy and / = L/lp. This distribution is plot- 
ted in Fig. [2] for several values of L. It turns out to be 
quite peculiar: one can observe a striking crossover from 
a nearly Gaussian shape to a significantly non-Gaussian 
one as the chain length decreases. For short chains the 
distribution behaves as a power law 1/ J over an increas- 
ing range of values of J, giving rise to a very long tail. As 
a result, the average ( J);^ is dominated by large but rare 
values. Here and below, angular brackets with subscript 
Ip refer to the averaging with distribution (|17p : 



7 = + 1 + + 



1/3 



and 5 — 21a 



Tdp 



For short chains [shorter than (27a)"'^/^7rZp, or about 
0.82y, (5 » 1 and xo « {25)^'^/?,. Thus, the short- 
length behavior of this leading factor is dramatically dif- 
ferent in its functional form from exp (— 27r^^p/L) of the 
Shimada-Yamakawa result: 



(exp 



E 



1 



277r'^ I 



2a L 



1/3- 



(22) 



{f{x))i, = / f{x)<P{x)dx, 



(19) 



or, equivalently, (/(j))/^ = J f{j)p{j)dj. 

An analytical expression for {J)ip can be obtained 
from the distribution. Conducting an integration by the 
method of the steepest descent, one can obtain the aver- 
age of the leading factor (exp {—E/kT))i^: 



(exp 



E 
kf 



exp 



27r Ip 
Lxo 



2al„ 



l + t(^)" 



(20) 



The full expression for {J{L))i^ is plotted in Fig. [3] to- 
gether with Jsy{L). The average (J);^ significantly ex- 
ceeds the classical value Jsy for short chains. 

The characteristic range of the distribution of J is in- 
dicated in Fig. [3] by shading the area between the graphs 
corresponding to x = 1 -I- cr and x = 1 — cr in expres- 
sion . This shaded area includes 68% of random se- 
quences since the distribution of x is Gaussian and J{x) is 
a monotonic function of x. For short chains the widening 
of the probability distribution due to the inhomogeneous 
bending rigidity is even stronger than due to the intrinsic 
curvature. 
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Chain length L <r > 

FIG. 3: Effect of the sequence-dependent bending rigidity on 
the cycUzation probabiUty. The contribution of the torsional 
constraints to the J-factor is omitted. The dashed line rep- 
resents the Shimada-Yamakawa result. The solid line is the 
average value of the J-factor over the bending rigidity, which 
is strongly enhanced for short chains. The width of the dis- 
tribution is indicated by the grey shading; it exceeds an order 
of magnitude for short chains (and almost vanishes for long 
chains) . 



V. DISCUSSION 

Here we combine the contributions of both random in- 
trinsic curvature and inhomogeneous bending rigidity. In 
Fig. m we present the overall result and restore torsional 
effects giving rise to the oscillatory behavior of J{L) . The 
theoretical curves are plotted in real physical units and 
compared to experimental data. The value of J averaged 
over both types of disorder (J(L))kQ,ip and the typical 
range of its distribution arc shown as functions of the 
chain length. 

The discrepancy between the theory and the experi- 
mental data of Refs. 0, 0] remains statistically signifi- 
cant. While this deviation might be attributed to the 
effects of "kinks" Q and "bubbles" 0, HH. we note 
that the overall theoretical result strongly depends on 
the choice of the boundary conditions. The current ex- 
periments do not ensure that the ends of the looped 
segment are perfectly aligned [t{L) = t(0)]. Rather, 
the cyclization is achieved by the hybridization of the 
complementary single-stranded ends. Upon this process 
(and before the ligation) each of the two strands is not 
structurally continuous since there is no covalent bond- 
ing involved. As a result, the flexibility on both sides 
of the newly hybridized region is higher than elsewhere 
along the double-stranded loop and could be modeled 
with modified boundary conditions. The extreme case 
of a completely free joint at the location of the newly 
hybridized region has already been solved by Shimada 
and Yamakawa in Ref. Q. The result of this calcula- 
tion (also shown in Fig.[?]by the dotted curve) is in bet- 
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FIG. 4: Overall effect of the sequence disorder on the cycliza- 
tion probability. The torsional contribution to the J-factor 
is taken into account, and physical units are restored. The 
dashed line is the classical Shimada- Yamakwa result. The 
average of the J-factor over both types of sequence disorder 
is represented by the solid line. The distribution range is 
indicated by the grey shading; it spans three to four orders 
of magnitude for chains shorter than 2/3 of the persistence 
length. Comparison to the experimental data of Cloutier and 
Widom 01 and Du et al. [3 is made. While the absolute 
values of the experimental data of Cloutier and Widom are 
not reached for 68% of random sequences, the spread of their 
data is consistent with our prediction. The dotted line repre- 
sents the J-factor for a loop with unconstrained orientations 
of the ends. It indicates that the boundary conditions may 
be the clue to the discrepancy in absolute values. 



ter agreement with the experimental data of Refs. [1, 0]- 
A detailed analysis of a modified model with intermedi- 
ate boundary conditions (neither completely free nor per- 
fectly aligned tangent vectors) is conducted in Ref [20]. 

The overall width of the distribution reaches three or- 
ders of magnitude in the range of lengths of Fig. H) This 
predicted width is consistent with the spread of the ex- 
perimental data of Cloutier and Widom 0]. On the 
other hand, we cannot make a conclusive comparison 
of the calculated width with the experimental data of 
Refs. d, [2l| due to insufl&cient statistics. 

It should be emphasized that both the classical theory 
by Shimada and Yamakawa and its derivatives implic- 
itly assumed the property of self- averaging: the looping 
probability (J-factor) of a "typical" DNA segment was 
expected to follow a certain function of its length L, at 
least in the first approximation. The major result of our 
work is that there is no self-averaging in the problem of 
looping and cyclization in the most interesting regime of 
short chains [L < Ip). The J-factor is not a well-defined 
function of the length, not even to the first approxima- 
tion. In other words, there is no "typical" DNA segment 
in that regime. The lack of self-averaging is indicated by 
extremely wide range of the distribution of the J-factors 
and by strongly non-Gaussian (power-law-like) behavior 
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of that distribution. This constitutes a drastic modifi- 
cation to the original theory (and any of its derivatives) 
and has far reaching impUcations for the future studies 
of the problem. This also makes DNA looping and cy- 
clization dramatically different from the traditional DNA 
stretching problem, where the property of self-averaging 
is automatically satisfied. The notion of the probability 
of looping of short DNA segments is simply meaningless 
outside the context of a particular DNA sequence. 

Due to this lack of self-averaging, future experiments 
should involve a sufficiently large ensemble of sequences 
of the same length in order to be conclusive. For in- 
stance, by measuring the cyclization probability in an 
unbiased mixture of such DNA segments, one can find 
(J), which is expected to follow a functional form very 
different from the Shimada-Yamakawa result. Similarly, 
the distribution function of the J-factor can be related 
to the statistics of the loop closing times. The latter 
could be measured e.g. by using the molecular-beacon- 
type technique [2^ . 

The numerical values of parameters k and a may be 
a possible source of uncertainty in our results. These 
values were based on the experimental data of Refs. [Tsj 
and [l^ . which appears to be quite unreliable. In partic- 
ular, the difference between the reported sets of exper- 
imental values determined on the basis of two different 
models ("Consensus" and "DNasel") is rather substan- 
tial and reaches a factor of two or three for certain trin- 
ucleotides JJJ . Given the exponential nature of the de- 
pendence of the J-factor on parameters k and a, one can 
expect a major numerical variation in our final results if 
a different experimental data set is employed. When we 
used the DNasel values instead of the Consensus ones, 
we did observe almost an order of magnitude variation 
of the J-factor for short chains. In the future, we ex- 
pect that more reliable experimental data will become 
available and more accurate estimates of k and a can be 
obtained. 

Our results suggest that in nature DNA looping may 
have been substantially optimized by the evolutionary se- 
lection of the appropriate sequences. By comparing the 
experiments on DNA looping and cyclization performed 
over natural sequences to our results for generic random 
sequences, one can estimate the significance of this evo- 
lutionary optimization. 
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APPENDIX 

The purpose of this appendix is to demonstrate that 
the additional boundary constraint r(L) = r(0) con- 
tributes a small correction to the bending energy of 
the loop, Eq. (jl2p . obtained under boundary conditions 
t{L) = t(0) only. 

Consider a looped DNA chain. Since the intrinsic cur- 
vature is assumed to be zero, the ground state of such a 
chain is confined to a plane. Let b be a unit vector nor- 
mal to that plane and 6{s) be the in-plane orientation 
angle of a chain segment at coordinate s. Let also f be 
the force acting between the two ends and t be the con- 
stant torque along the chain; both are needed to enforce 
boundary conditions ^ . The shape of the loop ground 
state is determined by the following equations: 



dt 



m 

ds 



b X t, 



80 T + h- (r(s) X f) T + (r(s) • f*) 



ds 



lp{s) 



lp{s) 



(A.l) 



(A.2) 



where we introduced f * = f x b. The ground state bend- 
ing energy can be formally expressed as a quadratic form 
with respect to t and f*: 



E 
kf 



2 [ds 



ds 



2lp{s) 



-ds : 



- M -v 



(A.3) 



Here we combined the torque and the force into a single 
"vector" -iF = (r, f * ) and introduced "matrix" M that 
depends on the loop shape: 



M = 



1 X Y 
X X2 XY 
Y XY Y2 



ds 
lp{s)' 



(A.4) 



In the last equation, X{s) and Y{s) are the in-plane com- 
ponents of the chain conformation r(s). They should not 
be confused with "components" of "vector" v, which we 
will distinguish by indices 1, 2, and 3 below. "Compo- 
nent" 1 of that "vector" corresponds to the torque, and 
"components" 2 and 3 correspond to the two in-plane 
components of the "force" f * . 

Now we need to choose the torque and the force in 
such a way that boundary conditions are satisfied. In 
particular, the tangential constraint imposes: 



^ ^ds - — — 



ds 



kT dr 



M -v 



2tt. 



(A.5) 



The other constraint, u = r(L) — r(0) = 0, is automati- 
cally satisfied in the uniform case [lp{s) = const], when 
the ground state of the loop is a circle dd/ds = 2tt/L. 
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In a moderately non-uniform case, one can express u in 
terms of small deviations of dO/ds from its average value 
2tt/L: 



b X — / v(s)ds — 0. 



For convenience, we choose the origin to be in the center 
of mass of the loop. In this case r{s)ds = and the 
last condition reduces to 



(A.7) 



We can now calculate the overall energy by substitut- 
ing = (27r,0,0) • into Eq. (|X3|l : 

A . M (m-a 

kT 2 V J 11 



-tt'2 - 



tt2 ' 



2tt^L / 2XY XY - X'^Y^ - X^Y 



X2 y2 _ XY 



(A., 



Here the bar denotes averaging for a given sequence with 
a weight factor proportional to \/lp{s), for example: 



Xds 



xL Jq Ip (s) 



(A.9) 



parameter x is defined in Eq. ((12)) . Note that in the 
uniform case X = Y = XY = Oa.ndX^ = Y^ = L^/Stt^. 
Retaining only the leading terms in disorder, one can 
obtain from Eq. (|A.8|) : 



E 
kf 



(A. 6) Introducing 



W = X + iY: 



xL 



Ir. 



1 + 87r^ 



,X +Y 



L2 



(A.IO) 



xL Jq 2Trlp{s) 



exp 



27ris 



one can re- write the previous result as 



E 
kf 



2ti^I 



xL 



^(1 + A): 



ds, (A.ll) 



(A.12) 



where A = Stt'^ww* / L'^ and w* is the complex conju- 
gate of w. The dominant term in the last expression, 
2Tr'^lp/xL, coincides with our result obtained with- 
out accounting for the additional constraint r{L) = r(0). 
Thus, constraining positions of the ends in addition to 
constraining their orientations adds only a correctional 
term to the bending energy of the ground state. The 
ensemble average of correction A is given by: 



(A) 



8^ 



2aln 



[WW 



(A.13) 



Since a ~ 0.0025, correction (A) is less than 1% for all 
chain lengths L > lp/2. 
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